E n = n + 1. λ n k (x k ˆm) 2. n 1. (x k m) 332:525 Homework Set 1
|
|
- Everett Lambert
- 5 years ago
- Views:
Transcription
1 332:525 Homework Set Estimation Problems. Recursive Least-Squares (RLS) Estimators: Consider a sequence of iid random variables x n, n = 0,,..., and form the running average of the first numbers considered as an estimate of the mean m = E[x n ]: ˆm n = x 0 + x + +x n (a) Show that ˆm n is the optimum solution that minimizes the sum of squares: E n = (x k ˆm) 2 What is the minimized value of E n? (b) Moreover show that ˆm n can be re-expressed in the following timerecursive forms, with the second being a Kalman-type predictor/corrector form: ( ) ( ) n ˆm n = ˆm n + x n ( ) ˆm n = ˆm n + (x n ˆm n ) Note that these recursions connect the optimum least-squares solutions of two different performance indices. Indeed, ˆm n minimizes the performance index E n, whereas ˆm n minimizes E n which runs up to time k = n. (c) Show that ˆm n is an unbiased estimator of the mean. Determine the variance of ˆm n, that is, the quantity, var( ˆm n )= E[( ˆm n m) 2 ] and show that ˆm n is a consistent estimator of the mean. Hint: Show first that ˆm n m = (x k m) and use the assumption that the x n are iid, which implies the decorrelation condition: E[(x i m)(x j m)]= σ 2 x δ ij. 2. RLS Estimators with Forgetting Factor: The RLS estimator ˆm n of the previous problem is appropriate for stationary sequences, that is, whose statistical characteristics don t change over time. Indeed, the performance index E n treats all time samples from the earliest to the latest on a equal footing. Initially, the estimator ˆm n converges very fast to the optimum value m and then gets stuck at that optimum value because the Kalman-type gain factor that appears in the time-update becomes extremely small with increasing n. If there is a non-stationary change in the statistics and the mean m changes to a new value, the estimator ˆm n will have a very hard time tracking this change. A more appropriate estimator for tracking non-stationary changes in the statistics would be one that places more emphasis on the more recent data and less on the older data. For example, the following weighted version of E n emphasizes the current samples more and forgets the older ones exponentially fast: E n = λ n k (x k ˆm) 2 where the forgetting factor λ must be 0 <λ. Note that λ = recovers the above stationary case. (a) Determine the optimum ˆm that minimizes E n and cast it in a timerecursive form such as: ˆm n = ˆm n + b n (x n ˆm n ) How does b n behave in the limit λ? Show that ˆm n is an asymptotically unbiased estimator of m. (b) Show that for fairly large values of n and for λ, the estimator satisfies the first-order difference equation (otherwise known as a first-order smoother): ˆm n = λ ˆm n + ( λ)x n () 3. RLS Estimators with Forgetting Factor: The first-order smoother estimator of Eq. () was obtained for fairly large values of n. However, it can be thought of as a third-type of estimator in its own right. Assume, therefore, that Eq. () defines ˆm n for all n 0. Show that it is asymptotically unbiased but not consistent. Indeed, show that in the limit of large n, the variance of ˆm n tends to the finite value: 2
2 var( ˆm n )= E [( ˆm n E[ ˆm n ] ) 2 ] λ + λ σ2 x However, by choosing λ it can be made as small as desired, thus providing a good estimator. The trade-off is that the closer λ is to, the more sluggish the estimator becomes in tracking non-stationarities. 4. Least-Mean-Square (LMS) Estimators: Consider the theoretical performance index E( ˆm)= E[(x ˆm) 2 ] (2) (a) Differentiating it with respect to ˆm, show that E is minimized for the optimum value of the parameter ˆm = m = E[x]. (b) The LMS algorithm is based on the idea of steepest descent in which ˆm is changed iteratively so that at each iteration the performance index E is decreased and eventually it reaches its minimum value. The key condition is to demand that going from one value of ˆm to the next, say, ˆm + Δ ˆm, will result in a smaller performance index, that is, E( ˆm + Δ ˆm) E( ˆm). This can be guaranteed by choosing the change Δ ˆm to be proportional to the negative gradient of E, that is, (with μ>0) Δ ˆm = μ E (LMS update) ˆm Replace the theoretical gradient by the instantaneous one: E ˆm E ˆm = 2(x n ˆm n ) Apply the LMS update to the instantaneous gradient, that is, ˆm n+ = ˆm n + Δ ˆm n = ˆm n μ E ˆm n And, show that it can be written in a similar form as the RLS estimator of Eq. (): ˆm n+ = λ ˆm n + ( λ)x n where λ = 2μ. Thus, the LMS and RLS algorithms for the recursive estimation of the mean are essentially equivalent. Note, however, that in adapting more than just one parameter, the LMS and RLS algorithms are no longer equivalent the latter having a much faster learning speed at the expense of higher computational cost. 5. Do problems.9 and.0. For Problem.0, suppose the mixing parameter ɛ is known in advance. Instead of sending x and y into a correlation canceler, you carry out a preprocessing operation, replacing {x, y} by the signals {x,y }, where x = x and y = y ɛx, and then send those into a correlation canceler. Determine the optimum canceler weight H. Show that now the noise component of x can be canceled completely. Draw a block diagram of all the processing operations. Note: The circumstances of this problem arise in adaptive antenna sidelobe canceling systems that use linearly polarized antennas. Polarization is used as a useful discriminant between signal and interference. In this application, the parameter ɛ is related to the known polarization angles of the desired signal. The interference signal is also polarized but with unknown polarization angles with respect to the antennas, but that does not matter because the subsequent adaptive canceler determines them adaptively and cancels the interference completely. 6. (a) Let ˆx be the optimum linear estimate of a scalar x based on the random vector y. Show that ˆx remains invariant under a linear invertible transformation of the observation vector, y = Bz (b) Show that E[eˆx]= 0 and E[e 2 ]= E[ex], where e = x ˆx. (c) If x is uncorrelated with y, show that ˆx = Let x be a random variable with mean E[x]= m. We wish to estimate x in terms of a zero-mean vector of observations y. Because the mean of x is not zero, we seek an estimate of the form ˆx = h T y + b The b-term is called a bias term. Assume the correlations R = E[yy T ] and r = E[xy] are known. Show that the optimum choices for h and b that minimize the mean square estimation error E = E[e 2 ], where e = x ˆx, are h = R r and b = m Note: It is straightforward to reformulate such biased estimates adaptively. They are very common especially in neural network applications. 8. (a) Show that the optimum estimate of y based on itself is itself, that is, ŷ = y. (b) Let z = Qy, where Q does not have to be invertible or square. Show that the optimum estimate of z based on y is given by ẑ = Qy, that is, ẑ = z. 3 4
3 (c) Suppose [ ] y is divided into two subvectors y and, that is, y = y. Using the results of the previous part or working directly, show that the optimum estimate of y based on y, that is, ŷ = E[y y T ]E[yy T ] y, is given simply by ŷ = y. 9. (a) A random variable x is related to the random vectors y and by [ ] x = c T y + c T 2 + v = [c T, c T y 2 ] + v = c T y + v where v is uncorrelated with y and. Show that the [ best] estimate y of x based on the combined observation vector y = is given by ˆx = c T y. Therefore, the y-dependent part of x is completely canceled from the error output e = x ˆx, that is, e = v, in this case. (Hint: Show that the solution of the normal equations is h = c.) (b) Determine the optimum estimate ˆx = h T y of x based only on the first observation vector y and show that in this case the y -dependent part of x is still canceled completely from the error output e = x ˆx, whereas the -dependent part is canceled as much as possible, in the sense that e is given by where e = v + c T 2 ( ŷ 2/ ) ŷ 2/ = E[ y T ]E[y y T ] y = R 2 R y is the best estimate of based on y.(hint: Express h in terms of c, c 2, R, R 2.) (c) Show that the minimized mean square error of the above case is given by: E=E[e 2 ]= σv 2 + c T ( 2 R22 R 2 R R T 2) c2 where R 22 = E[ y T 2 ]. Why is the second term in E non-negative? Note: The results of this problem will be used later to develop guidelines for picking the filter order in adaptive filtering applications y 0. Let R = be the covariance matrix of y =, assumed to have zero mean. Determine the innovations representation y 3 y = Bɛ by carrying out the Gram-Schmidt orthogonalization of the components of y. Then, verify the factorization R yy = BR ɛɛ B T by explicit matrix multiplication. Next consider the estimation of a random variable x in terms of y. The 4 cross correlation between x and y is known to be r = E[xy]= 4. 2 Determine the optimum estimation weights h and g with respect to the correlated basis y and the innovations basis ɛ, that is, ˆx = h T y = g T ɛ Hint: Use g = D Lr and h = L T g, where D = R ɛɛ and L = B.. For the previous problem, compute the optimum estimates of x based on the three successively bigger subspaces Y = {y }, Y 2 = {y, }, Y 3 ={y,,y 3 }, in the forms ˆx = h y = g ɛ ˆx 2 = h 2 y + h 2 = g 2 ɛ + g 22 ɛ 2 ˆx 3 = h 3 y + h 3 + h 33 y 3 = g 3 ɛ + g 32 ɛ 2 + g 33 ɛ 3 Show that the g-weights are independent of the order, that is g pi = g i, where g i was found in the previous problem. Show that the above estimates can be recursively constructed by ˆx = g ɛ ˆx 2 = ˆx + g 2 ɛ 2 ˆx 3 = ˆx 2 + g 3 ɛ 3 Assuming σx 2 = 30, use the recursions E i = E i g 2 i E[ɛ2 i ], where E i = E[e 2 i ]= E[ (x ˆx i ) 2], to determine the successive estimation errors E, E 2, E 3. Note the gradual improvement of the estimate as the number of observations is increased. Finally, determine the predictions ŷ 2/ and ŷ 3/2 of and y 3 based on the past subspaces Y and Y 2, respectively, write them in the forms, ŷ 2/ = a 2 y = b 2 ɛ ŷ 3/ = a 3 y a 32 = b 3 ɛ + b 32 ɛ 2 and show that the inverse innovations matrix L = B can be expressed as: L = 0 0 b 2 0 b 3 b 32 = 0 0 a 2 0 a 3 a
4 2. Consider the deterministic random signal y n = 2 cos(ω n + φ), where ω = π/3 and φ is a random phase distributed uniformly over the interval [0, 2π]. (a) Show that y n satisfies an ordinarnd order homogeneous difference equation. (b) Using the definition R(k)= E[y n+k y n ], show that R(k)= 2 cos(ω k). (c) Let y = [y 0,y, ] T be three consecutive samples. Using the results in (b), determine the 3 3 autocorrelation matrix R = E[yy T ] and show that it has zero determinant. (d) Because of the singularity of R, we expect the Cholesky factorization to break down at dimension 3. To see this, carry out the Gram- Schmidt orthogonalization of y starting with y 0 and ending with, and thereby determine the factorization R = BR ɛɛ B T. Is the result consistent with part (a)? 3. (a) Let R(k) be the autocorrelation function of a stationary random signal y[ n. Express ] the autocorrelation matrix of the random vector yn y = in terms of R(k). Then, show the general inequality y n+k R(k) R(0), for all k (b) Let u, v be two random variables. Show the Schwarz inequality: Hint: y = [u, v] T. E[uv] 2 E[u 2 ]E[v 2 ] Supplement Probability and Statistics Problems. (a) Let x be a zero-mean gaussian random variable with variance σ 2 x. Show E[x 4 ]= 3σ 4 x (b) Let x = [x,x 2,...,x N ] T be a block of mutually uncorrelated zeromean gaussian random variables each with variance σx 2. Using the above result, show ( ) E[x i x j x k x l ]= σx 4 δij δ kl + δ ik δ jl + δ il δ jk Show also that their covariance matrix is R xx = E[xx T ]= σ 2 xi where I is the N N identity matrix. (c) Suppose the above N random variables x are mixed up by an arbitrary invertible linear transformation y = Bx resulting into the new set of gaussian random variables y = [y,,...,y N ] T. Let R = E[yy T ] be their covariance matrix. Show that R = σ 2 BB T (d) Show the analogous result of part (b): E[y i y j y k y l ]= R ij R kl + R ik R jl + R il R jk 2. An estimate of the mean m of N independent identically distributed random variables {y,,...,y N } of variance σ 2 can be formed by the weighted sum ˆm = h y + h 2 + +h N y N Determine expressions for the mean and variance of ˆm, that is, the quantities E[ ˆm] and var( ˆm). What are the constraints on the weights h i in order for ˆm to be an unbiased estimate of m? What are the optimal choices for these weights, if in addition, it is required that the variance var( ˆm) be minimum? 3. The sample mean of N independent gaussian random variables {y,,...,y N } of mean m and variance σ 2 is given by 7 8
5 ˆm = ( ) y + + +y N N First, show that ˆm is unbiased and its variance is var( ˆm)= σ 2 /N. Then, show that the probability density of ˆm is p( ˆm)= N /2 (2π) /2 σ exp[ N ( ˆm m)2] 2σ2 Moreover, show that as N, this density converges to the deterministic delta function density p( ˆm) δ( ˆm m). 4. Consider N independent gaussian random variables {y,,...,y N } of mean m and variance σ 2. The sample variance is defined as ˆσ 2 = N (y i ˆm) 2 N where ˆm is the sample mean as defined above. Show that the mean and variance of the sample variance are given by E[ˆσ 2 ]= N N σ2, var(ˆσ 2 )= N N 2σ4 N Note: This is somewhat lower than the CR lower bound 2σ 4 /N. But, this is no contradiction because the CR bound applies to unbiased estimators and the above is slightly biased. 5. Continuing with the previous problem, we can form an unbiased estimator for the variance by the standard deviation: s 2 = N N (y i ˆm) 2 Therefore, s 2 = N ˆσ 2 /(N ). Show that its mean and variance are This does satisfy the CR bound. E[s 2 ]= σ 2, var(s 2 )= 2σ4 N 6. Next, we determine that the probability distribution of s 2 is a χ 2 -distribution with (N ) degrees of freedom. In the definition of s 2, there are N squared terms (y i ˆm) 2, yet we divided by (N ) not N. But, these terms are not mutually independent because of the presence of ˆm. Using these dependencies, one can express s 2 as a sum of (N ) independent square terms, as follows. (a) Consider the following linear transformation (know as Helmert s transformation) from the set {y,...,y N } to a new set {z,...,z N }: z i = c i ( y + + +y i iy i+ ), i =, 2,...,N z N = c N ( y + + +y N ) Determine the scale factors c i in order for the z i s to have unit variance. (b) Then, show that the z i have zero mean and are mutually uncorrelated: E[z i z j ]= δ ij, i,j =, 2,...,N (c) Then, show that the linear transformation preserves the sum of the squares, N z 2 i = N σ 2 therefore, it is an orthogonal transformation. Finally, show that the sum of the first (N ) squared terms is i N χ 2 = z 2 i = N (y σ 2 i ˆm) 2 Thus, the sum of the N squared terms in the right-hand-side follows a normalized χ 2 -distribution with (N ) degrees of freedom. 7. The following twenty random numbers come from an unknown probability distribution: {0.33, 0.52, 2.4,.93, 0.46, 0.44, 0.97, 0.38, 0.48,.29,.82,.23, 0.2, 2.66,.22, 0.4, 0.95,.47, 0.83, 0.43} Test the hypothesis that the underlying distribution is gaussian with zero mean and unit variance. To do this perform the χ 2 test by dividing the range of the gaussian distribution into the following six bins: 9 0
6 (,.5), (.5, 0.5), ( 0.5, 0.0), (0.0, 0.5), (0.5,.5), (.5, ) If the i-th bin is the interval (x i,x i ), then the theoretically expected number of observations that will fall into the i-th bin will be N th i N = F(x i) F(x i ) where N is the total number of observations and F(x) is the cdf of the assumed gaussian distribution, that is, F(x)= 2π x e z2 /2 dz Let N i be the actual number of observations that fall into the i-th bin. Then, calculate the χ 2 statistic given by χ 2 = B (N i N th i ) 2 where B is the number of bins here, B = 6. This quantity follows a χ 2 - distribution with B degrees of freedom. Thus, its mean will be equal to the number of degrees of freedom, namely, B. If your calculated χ 2 is near the theoretical mean B, then you cannot reject the hypothesis that the pdf was gaussian. Alternatively, you can look up the 95 percent confidence interval of the χ 2 distribution with B degrees of freedom, that is, the interval 0 χ 2 χ such that the probability of a χ 2 value falling in it is 0.95 or equivalently, the probability of a χ 2 value falling outside it is only Then, if your calculated value of χ 2 falls within that interval you can with 95 percent confidence conclude that the gaussian assumption cannot be rejected. Note: For B = 5 degrees of freedom, we have χ = Let F(x) be the cdf of a pdf f(x). Show that the random variable u defined by N th i u = F(x) is distributed uniformly over the interval [0, ). Therefore, random variables x following the pdf f(x) can be generated from a uniform random number generator using the inverse function x = F (u). This is the inversion method for generating random numbers from uniform ones (see Appendix A). 9. The Rayleigh probability density finds application in fading communication channels: p(r)= r σ 2 e r2 /2σ 2, for r 0 Using the inversion method, show how to generate a Rayleigh-distributed random variable r from a uniform variable u. 0. The inversion method may also be applied to the problem of generating discrete-valued random variables. Let x be a random variable that can only take one of the discrete values {x,x 2,...,x M } with probabilities {p,p 2,...,p M }, respectively. It is assumed, of course, that the p i sum up to unity. You have available a uniform generator in the interval [0, ). Explain how to generate the discrete random numbers x from a uniform u.. You want to simulate a binary experiment in which only two outcomes can occur, one with probability p and the other with probability p. For example, simulating successive throws of heads or tails, or the transmission of bits 0 or, or, an accept/reject decision, etc. This is the same as the previous problem, with M = 2. The procedure for picking one or the other outcome can be mechanized as follows:. Generate a uniform u. 2. If 0 u<p, then pick the first outcome. 3. If p u<, then pick the second outcome. Explain why this procedure generates the two outcomes with the correct probabilities p and p. Note: The optimization method of simulated annealing uses such twovalued random variables. It is an iterative method of minimizing a performance index J(λ), where λ is a vector of parameters with respect to which J must be minimized. Consider two successive choices of the parameter vector, λ new and λ old, and compute the change in the performance index: ΔJ = J(λ new ) J(λ old ). Most iterative minimization algorithms, such as steeliest descent or Newton s method, try to continuously keep decreasing J, that is, they demand that the change in λ always be such that ΔJ 0. This can easily drive the λ into a local minimum of J and then the algorithm gets stuck there. To alleviate this problem, the so-called Metropolis algorithm of simulated annealing allows on occasion J to increase, that is, ΔJ > 0, in order to 2
7 jump over such local minima and continue decreasing towards the absolute minimum. The algorithm is as follows: If ΔJ 0 then accept the change in the parameter vector λ old λ new. But if ΔJ > 0 then accept the change only with probability p = e βδj and reject the change with probability p, where β is a suitable positive constant. Using the results of this problem, it should be clear how one will make the decision of whether to accept or reject the change. 2. Consider the Box-Muller transformation x = ( 2lnu) /2 cos(2πv), y = ( 2lnu) /2 sin(2πv) Show that if {u, v} are independent uniform random variables in the interval [0, ), then {x, y} are two independent gaussian random variables with zero mean and unit variance. 3. Consider the generalized Box-Muller transformation x = ( 2lnu) /2 cos(2πv), y = ( 2lnu) /2 sin(2πv φ) where φ is a constant angle. Show that if {u, v} are independent uniform random variables in the interval [0, ), then {x, y} are two jointly gaussian random variables with zero mean, unit variance, and correlation coefficient E[xy]= cos φ. 4. Let X and X 2 be two independent random variables with cdf s F (x) and F 2 (x). Show that the random variable X = max(x,x 2 ) has cdf F(x)= F (x)f 2 (x). Show also that X = min(x,x 2 ) has cdf F(x)= F (x)+f 2 (x) F (x)f 2 (x). 5. The inversion method of generating random variables is convenient only when the cdf F(x) is known in closed form or is easily computed. An alternative method that works well when the pdf f(x) is known but the cdf F(x) is complicated, like the gaussian case, is the rejection method. It requires two conditions that are not difficult to meet: First, there exists a so-called majorizing pdf g(x) such that f(x) is bounded from above by f(x) cg(x), for all x where c is a given constant. Second, it is much easier to generate random variables from the distribution g(x) than from f(x). The following algorithm generates an x distributed according to f(x):. Generate an x from the distribution g(x). 2. Generate a y which is uniformly distributed over [0, cg(x)]. 3. If y f(x), then output x; else, go to step and repeat. To show that this procedure correctly generates x s that are distributed according to f(x), we must show that the conditional density of an x generated as above and given that y f(x), is equal to the desired density f(x), that is, p ( X = x Y f(x) ) = f(x) (a) Show first that necessarily c and that p ( Y f(x) X = x ) = f(x) cg(x) which follows from the fact that y is uniform. (b) Then, integrate the above over all x s generated from g(x) to get p ( Y f(x) ) = c (c) Finally, use Bayes rule to determine the quantity p ( X = x Y f(x) ) = p ( Y f(x) X = x ) p(x = x) p ( Y f(x) ) 6. Let y be an M-dimensional gaussian random vector with zero mean and covariance matrix R. Show that the information content or entropy of y is given by S = p(y)ln p(y)d M y = ln(det R) 2 up to an unimportant additive constant. 7. Let y = Bɛ be the innovations representation of an M-dimensional gaussian zero-mean vector. Show that its entropy can be written, up to an additive constant, as follows S = p(y)ln p(y)d M y = 2 M ln E i where E i = E[ɛ 2 i ] are the variances of the innovations. 3 4
8 8. (a) For any two positive real numbers a and b, show the inequality [ ] a a ln b a b (b) Let y be an M-dimensional random vector. For any two probability densities p(y) and q(y), prove the following information inequality, [ ] p(y) p(y)ln d M y 0 q(y) with equality attained when p(y)= q(y). 9. Consider the subset of all M-dimensional probability densities p(y) that have a given mean m and covariance Σ. Show that the density from this subset that has maximum entropy, S = p(y)ln p(y)d M y = max is the gaussian. Hint: Use Lagrange multipliers to enforce the given constraints. Alternatively, use the information inequality of the previous problem. 20. Let Re i = λ i e i, i =, 2,...,M be the M eigenvalues and orthonormal eigenvectors of the covariance matrix of an M-dimensional random vector y. Define the M transformed random variables: Thus, the M-vector y is represented by only L < M parameters, namely, z,z 2,...,z L. This approximation forms the basis of data compression using the Karhunen-Loeve transform. (c) Show the equality of quadratic forms M y T R z 2 i y = λ i (d) Determine the pdf p z (z) of the vector z = [z,z 2,...,z M ] T in terms of the pdf p y (y) (do not assume gaussian distributions). Show that the information content of y is the same as that of z, in the sense that they have equal entropies. (e) If we denote by B the modal matrix of R, that is, the matrix whose columns are the eigenvectors e i, then show that y is related to the z-basis as y = Bz, where B = [e, e 2,...,e M ] Show also that B satisfies BB T = B T B = I, and that R = BDB T, D = diag(λ,λ 2,...,λ M ) z i = e T i y, i =, 2,...,M (a) Show that they are mutually uncorrelated with variances λ i, that is, E[z i z j ]= λ i δ ij (b) Show that y can be expanded in terms of the z i as follows: y = M z i e i Thus, the randomness of y arises only from the randomness of the z i s which are uncorrelated. If the eigenvalues are arranged in decreasing order and the first L largest eigenvalues are dominant, then the sum may be approximated by y L z i e i 5 6
9 332:525 Solutions with optimum solution:. Differentiating E n with respect to ˆm and setting the gradient to zero gives: which has solution: E n ˆm = 2 ˆm n = (x k ˆm)= 0 n x k n = n x k In part (b), the required recursions were shown in class. For part (c), we take expectations of both sides of the definition of ˆm n to get: Next, we have: E[ ˆm n ]= ˆm n E[ ˆm n ]= ˆm n m = The variance of ˆm n will be then E[x k ]= E[( ˆm n m) 2 ]= () 2 x k j=0 m = m m = (x k m) E[(x k m)(x j m)] And, using the iid assumption, we have E[(x k m)(x j m)]= σ 2 x δ kj which gives for the variance of ˆm n : E[( ˆm n m) 2 ]= () 2 σxδ 2 kj = () 2 σ2 x j=0 2. The gradient of the performance index is now: E n ˆm = 2 λ n k (x k ˆm)= 0 = σ2 x ˆm n = n λ n k x k n λn k = x n + λx n + λ 2 x n 2 + +λ n x 0 + λ + λ 2 + +λ n Using the finite geometric series, we may write the denominator as which gives for the estimator ˆm n : λ n k = + λ + λ 2 + +λ n = λn+ λ ˆm n = ( λ) n λ n k x k λ n+ Replacing n by n and multiplying by a factor of λ, gives: λ ˆm n = ( λ)λ n λ n k n x k ( λ) λ n k x k λ n = λ n Thus, we can express the sum up to k = n in terms of ˆm n : n ( λ) λ n k x k = λ( λ n ) ˆm n Therefore, we obtain the recursion for ˆm n ˆm n = ( λ)( n λ n k x k + x n ) λ n+ = λ λn+ λ n+ ˆm n + λ λ n+ x n which can be written in the predictor/corrector form: ( ) λ ˆm n = ˆm n + (x λ n+ n ˆm n ) In the limit λ, the Kalman gain coefficient tends to the expected limit: ( ) λ lim = λ λ n+ On the other hand, if λ is strictly less than one, then the term λ n+ can be ignored after a few iterations, and therefore, the recursion becomes essentially the first-order smoother: ˆm n = ˆm n + ( λ)(x n ˆm n )= λ ˆm n + ( λ)x n 7 8
10 3. The difference equation ˆm n = λ ˆm n + ( λ)x n can be solved assuming zero initial conditions, by convolving the x n sequence with the filter sequence ( λ)λ n. This gives: ˆm n = ( λ) λ n k x k Taking expectations of both sides and using the finite geometric series, we obtain: E[ ˆm n ]= ( λ) λ n k m = λn+ λ m which tends to m for large n. Thus, ˆm n is asymptotically unbiased. Subtracting the mean E[ ˆm n ] from ˆm n gives also: ˆm n E[ ˆm n ]= ( λ) λ n k (x k m) Using the same sort of calculation as in Problem, we obtain for the variance of ˆm n : E [( ˆm n E[ ˆm n ] ) 2 ] = ( λ) 2 = ( λ) 2 n j=0 n j=0 λ n k λ n j E[(x k m)(x j m)] λ n k λ n j σxδ 2 kj = σx( 2 λ) 2 n λ 2(n k) = σ 2 x( λ) 2 λ2(n+) λ 2 = λ + λ σ2 x( λ 2(n+) ) which in the limit of large n converges to the required result. 4. The theoretical gradient is: E ˆm = 2E[(x n ˆm)]= 2(m ˆm) Thus, it vanishes when ˆm = m. The instantaneous gradient is obtained by dropping the expectation value, that is, 9 E ˆm = 2(x n ˆm) Putting this into the LMS updating equation gives: ˆm n+ = ˆm n + Δ ˆm n = ˆm n μ E ˆm n = ˆm n + 2μ(x n ˆm n ) Setting 2μ = λ, we rewrite the difference equation as ˆm n+ = ˆm n + 2μ(x n ˆm n )= ( 2μ) ˆm n + 2μx n = λ ˆm n + ( λ)x n 5. Problem.9: Using x = s + n = s + Fn 2 and y = n 2, we find R yy = E[ ]= E[n 2 2] and R xy = E[xy]= E[(x+Fn 2 )n 2 ]= FE[n 2 2]. The optimal canceler will be H = R xy R yy = FE[n 2 2]E[n 2 2] = F. The corresponding optimum estimate will be ˆx = Hy = Fn 2, and the estimation error e = x ˆx = (s + Fn 2 ) Fn 2 = s. Problem.0: First determine H. Noting that y = n 2 + ɛs = F n + ɛs and using the definition of the gain G, wefindr yy and R xy : Therefore, R yy = E[yy]= ( ) F 2 E[n2 ]+ɛ 2 E[s 2 ]= F + 2 ɛ2 G E[n 2 ] R xy = E[xy]= ( ) F E[n2 ]+ɛe[s 2 ]= F + ɛg E[n 2 ] H = R xy R yy = The error output will be F + ɛg F( + ɛfg) = F + + ɛ 2 ɛ2 G 2 F 2 G e = x ˆx = x Hy = s + n H ( F n + ɛs ) = ( ɛh)s + ( H ) n F 20
11 Thus, the coefficients a and b will be a = ɛh = b = H F = ɛf( + ɛfg) + ɛ 2 F 2 G = ɛf + ɛ 2 F 2 G + ɛfg ɛf = ɛfg + ɛ 2 F 2 G + ɛ 2 F 2 G = ɛfga If the coefficient ɛ is known in advance, then the pre-processed signals will be x = x = s + n = s + Fn 2 y = y ɛx = n 2 + ɛs ɛs ɛfn 2 = ( ɛf)n 2 Thus, y is correlated only with the noise part of x. We find E[x y ] = F( ɛf)e[n 2 2] E[y y ] = ( ɛf) 2 E[n 2 2] and, therefore, 6. For part (a), we have ˆx = H y = H = E[x y ]E[y y ] = F ɛf F ɛf ( ɛf)n 2 = Fn 2 e = x ˆx = s + Fn 2 Fn 2 = s E[yy T ]= BE[zz T ]B T E[yy T ] = B T E[zz T ] B And, similarly, E[xy]= BE[xz] The optimal Wiener weights with respect to the two bases are: h = E[yy T ] E[xy], Therefore, they are related by g = E[zz T ] E[xz] h = E[yy T ] E[xy]= B T E[zz T ] B BE[xz]= B T g or, h T = g T B. It follows that the optimal estimate ˆx will be invariant under a change of basis: Parts (b) and (c) were done in class. ˆx = h T y = g T B Bz = g T z 7. The estimation error is e = x ˆx = x h T y b. The minimization conditions for the performance index E = E[e 2 ] are E h = 2E[ e e ] = 2E[ey]= 0 h E b = 2E[ e e ] = 2E[e]= 0 b which are equivalent to E[ey] = E[(x y T h)y]= E[xy] E[yy T ]h = r Rh = 0 E[e] = E[x h T y b]= E[x] h T E[y] b = m b = 0 8. Part (a) follows from part (b) with the choice Q = I. For part (b), we have R zy = E[zy T ]= QE[yy T ]= QR yy H = R zy R yy = Q It follows that ẑ = Hy = Qy = z. Part (c) can be shown as follows: Note that the subvector y can be obtained from the full vector y by the projection matrix [ ][ ] I 0 y y = = Qy 0 0 where I is the identity matrix with the same dimension as y. Using part (b) with z = y, we find ŷ = y. This result can also be shown directly, as follows. Using the notation R ij = E[y i y T j ], for i, j =, 2, we have E[y y T ] = E [ y [y T, y T 2 ] ] = [ E[y y T ], E[y y T 2 ] ] = [R,R 2 ] E[yy T ] = E [[ ] y [y T y, y T 2 ] ] [ ] R R 2 = 2 R 2 R 22 But noting that we obtain [R,R 2 ]= [I, 0] H = E[y y T ]E[yy T ] = [R,R 2 ] Thus, ŷ = Hy = [I, 0] [ y ] = y. [ R R 2 R 2 R 22 ] [ R R 2 R 2 R 22 ] = [I, 0] 2 22
12 9. Using part (a) of the previous problem, we have ŷ = y. Therefore, ˆx = ĉt y = c T ŷ = c T y and e = x ˆx = c T y + v c T y = v. If the estimation is based only on the subvector y, then we have ŷ = y, and therefore, and for the error output ˆx = c T ŷ + c T 2 ŷ2 = c T y + c T 2 ŷ2/ e = x ˆx = c T y + c T 2 + v c T y c T 2 ŷ2/ = Setting e 2 = ŷ 2, we have e = v + c T 2 e 2. And, E = E[e 2 ]= σ 2 v + ct 2 E[e 2 e T 2 ]c 2 But, E[e 2 e T 2 ]= R 22 R 2 R R 2, which also shows the non-negativity property. 0. Going through the Gram-Schmidt orthogonalization procedure, we find the matrices B and D = R ɛɛ : B = , D = We also need the inverses 0 0 B = 2 0, R = B T D B = 2 2 Thus, the innovations basis is ɛ ɛ 2 ɛ 2 = ɛ = B y = and conversely, y = y = Bɛ = ɛ ɛ 2 ɛ 3 y y 3 = = y 2y y y ɛ ɛ 2 + 2ɛ ɛ 3 + 2ɛ 2 + 2ɛ For the estimation part, we calculate the h and g weights using the formulas g = D B r = 2 4, h = B T g = R r = 2 6. The three g weights are the optimal weights for the lower order estimation problems, that is, ˆx = g ɛ = 2ɛ ˆx 2 = g ɛ + g 2 ɛ 2 = 2ɛ 4ɛ 2 ˆx 3 = g ɛ + g 2 ɛ 2 + g 3 ɛ 3 = 2ɛ 4ɛ 2 + ɛ 3 Replacing the ɛ i in terms of the y i,weget ˆx = 2y ˆx 2 = 2y 4( 2y )= 0y 4 ˆx 3 = 0y 4 + (y y )= 2y 6 + y 3 For the mean square errors, using the variances of the ɛ i, {E,E 2,R 3 }= {2,, 2}, and starting with E 0 = σ 2 x = 30, we get E = E 0 g 2 E = = 22 E 2 = E g 2 2E 2 = 22 ( 4) 2 = 6 E 3 = E 2 g 2 3E 3 = = 4 For the prediction, we want to show that the a ij coefficients are the matrix elements of B. This can be seen in general by writing the expression of the ɛ i in terms of the y i, as follows: ɛ = y ɛ 2 = ŷ 2/ = + a 2 y ɛ 3 = y 3 ŷ 3/2 = y 3 + a 32 + a 3 y which is equivalent to ɛ = B y. ɛ ɛ 2 ɛ 3 = 0 0 a 2 0 a 3 a 32 y y
13 2. The difference equation is y n 2 cos ω y n + y n 2 = 0 Indeed, using y n = A cos(ω n + φ), we have 2 cos ω y n = 2A cos ω cos ( ω (n )+φ ) = A cos(ω n + φ)+a cos(ω n 2ω + φ)= y n + y n 2 where we used the trig identity 2 cos a cos b = cos(a + b)+ cos(a b) Using this trig identity again, we obtain for the autocorrelation function: R(k) = E[y n+k y n ]= A 2 E[cos(ω n + ω k + φ)cos(ω n + φ)] = 2 A2 E[cos(2ω n + ω k + φ)+ cos(ω k)] = 2 A2 cos(ω k) can be expressed as a linear combination of the other two, for example, the first column is expressible as cos ω cos 2ω = 2 cos ω cos ω cos ω The Gram-Schmidt construction proceeds as follows: where ɛ 0 = y 0 ɛ = y b 0 ɛ 0 ɛ 2 = b 20 ɛ 0 b 2 ɛ b 0 = E[y ɛ 0 ] E[y 0 y 0 ] = R() R(0) = cos ω cos 2ω cos ω The quantity E = E[ɛ 2 ] is calculated by squaring the expression y = ɛ +b 0 ɛ 0, taking expectations of both sides, and using E 0 = E[ɛ 2 0]= R(0): where the first expectation value is zero, as follows from the property 2π E[cos(φ + θ)]= cos(φ + θ) dφ 0 2π = 0 for φ uniform over [0, 2π) and θ deterministic. The 3 3 autocorrelation matrix will be R ij = E[y i y j ]= R(i j). Noting that R(i j)= R(j i), we find R = R(0) R() R(2) R() R(0) R() R(2) R() R(0) Its determinant is = 2 A2 cos ω cos 2ω cos ω cos ω cos 2ω cos ω det R = 8 A6[ + 2 cos 2 ω cos 2ω cos 2 2ω 2 cos 2 ω ] Using the trig identity cos 2ω = 2 cos 2 ω, we can verify that the expression in the brackets vanishes. The same result also follows from the observation that the rank of R is two not three because each column R(0)= E[ ]= E + b 2 0E 0 E = R 0 b 2 0E 0 = R(0)( b 2 0) or, Similarly, we find b 20 = E[ɛ 0 ] E 0 E = R(0)( cos 2 ω )= R(0)sin 2 ω = R(2) R(0) = cos 2ω b 2 = E[ɛ ] = E[(y b 0 y 0 )] = R() b 0R(2) E E E = cos ω cos ω cos 2ω sin 2 ω = cos ω ( cos 2ω ) sin 2 ω = cos ω (2 sin 2 ω ) sin 2 ω = 2 cos ω Thus, the B matrix will be 25 26
14 B = 0 0 b 0 0 b 20 b 2 = 0 0 cos ω 0 cos 2ω 2 cos ω The prediction error E 2 is expected to be zero because the can be predicted exactly from {y 0,y }, as follows from the difference equation applied with n = 2: 2 cos ω y + y 0 = 0 Indeed, squaring the equation = ɛ 2 + b 20 ɛ 0 + b 2 ɛ and taking expectations of both sides, we get 3. Part (a) follows from part (b) and stationarity. Indeed, E[yn+k y n ] 2 E[ n+k ]E[y2 n ] R(k) 2 R(0)R(0) or, R(k) R(0). [ ] Part (b) can be derived as follows: The autocorrelation u matrix of y = is v R = E[yy T ]= E [[ u v ] [u, v] ] = [ E[u 2 ] E[uv] E[vu] E[v 2 ] Because this matrix is positive semi-definite, its determinant will be nonnegative, that is, ] R(0)= E[ 2]= E 2 + b 2 20E 0 + b 2 2E det R = E[u 2 ]E[v 2 ] E[uv] 2 0 and solving for E 2 E 2 = R(0) b 2 20E 0 b 2 2E = R(0) cos 2 2ω E 0 4 cos 2 ω E = R(0) cos 2 2ω R(0) 4 cos 2 ω sin 2 ω R(0) = ( cos 2 2ω )R(0) 4 cos 2 ω sin 2 ω R(0) = sin 2 2ω R(0) sin 2 2ω R(0)= 0 Thus, the D matrix will be E D = 0 E 0 = R(0) 0 0 E sin 2 ω Finally, one should be able to verify the Cholesky factorization R = BDB T, which in this case reads as follows (we removed an overall factor of R(0)): cos ω cos 2ω cos ω cos ω = cos 2ω cos ω = cos ω 0 0 sin 2 ω 0 cos 2ω 2 cos ω cos ω cos 2ω 0 2cosω
ESTIMATION THEORY. Chapter Estimation of Random Variables
Chapter ESTIMATION THEORY. Estimation of Random Variables Suppose X,Y,Y 2,...,Y n are random variables defined on the same probability space (Ω, S,P). We consider Y,...,Y n to be the observed random variables
More informationShannon meets Wiener II: On MMSE estimation in successive decoding schemes
Shannon meets Wiener II: On MMSE estimation in successive decoding schemes G. David Forney, Jr. MIT Cambridge, MA 0239 USA forneyd@comcast.net Abstract We continue to discuss why MMSE estimation arises
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationThe Hilbert Space of Random Variables
The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More information5 Operations on Multiple Random Variables
EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationJoint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:
Joint Distributions Joint Distributions A bivariate normal distribution generalizes the concept of normal distribution to bivariate random variables It requires a matrix formulation of quadratic forms,
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationSolutions to Homework Set #6 (Prepared by Lele Wang)
Solutions to Homework Set #6 (Prepared by Lele Wang) Gaussian random vector Given a Gaussian random vector X N (µ, Σ), where µ ( 5 ) T and 0 Σ 4 0 0 0 9 (a) Find the pdfs of i X, ii X + X 3, iii X + X
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationLecture Note 1: Probability Theory and Statistics
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would
More informationSolutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π
Solutions to Homework Set #5 (Prepared by Lele Wang). Neural net. Let Y X + Z, where the signal X U[,] and noise Z N(,) are independent. (a) Find the function g(y) that minimizes MSE E [ (sgn(x) g(y))
More informationThe Multivariate Gaussian Distribution
The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationTopics in Probability and Statistics
Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X
More informationMonte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan
Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More information4 Derivations of the Discrete-Time Kalman Filter
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time
More informationADAPTIVE FILTER THEORY
ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More information5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors
EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationMachine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece.
Machine Learning A Bayesian and Optimization Perspective Academic Press, 2015 Sergios Theodoridis 1 1 Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens,
More informationADAPTIVE FILTER THEORY
ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department
More informationProblem Set 7 Due March, 22
EE16: Probability and Random Processes SP 07 Problem Set 7 Due March, Lecturer: Jean C. Walrand GSI: Daniel Preda, Assane Gueye Problem 7.1. Let u and v be independent, standard normal random variables
More informationPrincipal Component Analysis
Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)
More information2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.
CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationStatistical and Adaptive Signal Processing
r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory
More informationLinear Models Review
Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign
More informationProbability Background
Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second
More informationProbability Theory and Statistics. Peter Jochumzen
Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................
More informationLecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN
Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 11 Adaptive Filtering 14/03/04 http://www.ee.unlv.edu/~b1morris/ee482/
More informationStatistics 910, #5 1. Regression Methods
Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known
More informationLecture 4: Least Squares (LS) Estimation
ME 233, UC Berkeley, Spring 2014 Xu Chen Lecture 4: Least Squares (LS) Estimation Background and general solution Solution in the Gaussian case Properties Example Big picture general least squares estimation:
More informationLecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel
Lecture Notes 4 Vector Detection and Estimation Vector Detection Reconstruction Problem Detection for Vector AGN Channel Vector Linear Estimation Linear Innovation Sequence Kalman Filter EE 278B: Random
More informationLecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices
Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2
MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationECE 275A Homework 6 Solutions
ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =
More informationECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018
ECE534, Spring 2018: s for Problem Set #4 Due Friday April 6, 2018 1. MMSE Estimation, Data Processing and Innovations The random variables X, Y, Z on a common probability space (Ω, F, P ) are said to
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationENGR352 Problem Set 02
engr352/engr352p02 September 13, 2018) ENGR352 Problem Set 02 Transfer function of an estimator 1. Using Eq. (1.1.4-27) from the text, find the correct value of r ss (the result given in the text is incorrect).
More informationECE534, Spring 2018: Solutions for Problem Set #5
ECE534, Spring 08: s for Problem Set #5 Mean Value and Autocorrelation Functions Consider a random process X(t) such that (i) X(t) ± (ii) The number of zero crossings, N(t), in the interval (0, t) is described
More informationMathematical Methods wk 2: Linear Operators
John Magorrian, magog@thphysoxacuk These are work-in-progress notes for the second-year course on mathematical methods The most up-to-date version is available from http://www-thphysphysicsoxacuk/people/johnmagorrian/mm
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationSGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection
SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28
More informationwhich arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i
MODULE 6 Topics: Gram-Schmidt orthogonalization process We begin by observing that if the vectors {x j } N are mutually orthogonal in an inner product space V then they are necessarily linearly independent.
More informationLeast Squares and Kalman Filtering Questions: me,
Least Squares and Kalman Filtering Questions: Email me, namrata@ece.gatech.edu Least Squares and Kalman Filtering 1 Recall: Weighted Least Squares y = Hx + e Minimize Solution: J(x) = (y Hx) T W (y Hx)
More informationLeast Squares Estimation Namrata Vaswani,
Least Squares Estimation Namrata Vaswani, namrata@iastate.edu Least Squares Estimation 1 Recall: Geometric Intuition for Least Squares Minimize J(x) = y Hx 2 Solution satisfies: H T H ˆx = H T y, i.e.
More information26. Filtering. ECE 830, Spring 2014
26. Filtering ECE 830, Spring 2014 1 / 26 Wiener Filtering Wiener filtering is the application of LMMSE estimation to recovery of a signal in additive noise under wide sense sationarity assumptions. Problem
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More informationAdaptive Filter Theory
0 Adaptive Filter heory Sung Ho Cho Hanyang University Seoul, Korea (Office) +8--0-0390 (Mobile) +8-10-541-5178 dragon@hanyang.ac.kr able of Contents 1 Wiener Filters Gradient Search by Steepest Descent
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationPrincipal Component Analysis and Linear Discriminant Analysis
Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationMatrix Factorization and Analysis
Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More informationMatrix Factorizations
1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular
More informationUCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)
UCSD ECE53 Handout #34 Prof Young-Han Kim Tuesday, May 7, 04 Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei) Linear estimator Consider a channel with the observation Y XZ, where the
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationStochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno
Stochastic Processes M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno 1 Outline Stochastic (random) processes. Autocorrelation. Crosscorrelation. Spectral density function.
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationVectors and Matrices Statistics with Vectors and Matrices
Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc
More informationAPPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2
APPM/MATH 4/5520 Solutions to Exam I Review Problems. (a) f X (x ) f X,X 2 (x,x 2 )dx 2 x 2e x x 2 dx 2 2e 2x x was below x 2, but when marginalizing out x 2, we ran it over all values from 0 to and so
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationSUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)
SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems
More informationNumerical Algorithms as Dynamical Systems
A Study on Numerical Algorithms as Dynamical Systems Moody Chu North Carolina State University What This Study Is About? To recast many numerical algorithms as special dynamical systems, whence to derive
More informationProbability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver
Stochastic Signals Overview Definitions Second order statistics Stationarity and ergodicity Random signal variability Power spectral density Linear systems with stationary inputs Random signal memory Correlation
More informationFurther Mathematical Methods (Linear Algebra)
Further Mathematical Methods (Linear Algebra) Solutions For The 2 Examination Question (a) For a non-empty subset W of V to be a subspace of V we require that for all vectors x y W and all scalars α R:
More informationfor valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I
Code: 15A04304 R15 B.Tech II Year I Semester (R15) Regular Examinations November/December 016 PROBABILITY THEY & STOCHASTIC PROCESSES (Electronics and Communication Engineering) Time: 3 hours Max. Marks:
More informationProblem Set 1 Sept, 14
EE6: Random Processes in Systems Lecturer: Jean C. Walrand Problem Set Sept, 4 Fall 06 GSI: Assane Gueye This problem set essentially reviews notions of conditional expectation, conditional distribution,
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationExpectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Expectation DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance,
More informationMath Linear Algebra II. 1. Inner Products and Norms
Math 342 - Linear Algebra II Notes 1. Inner Products and Norms One knows from a basic introduction to vectors in R n Math 254 at OSU) that the length of a vector x = x 1 x 2... x n ) T R n, denoted x,
More informationMTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6
MTH739U/P: Topics in Scientific Computing Autumn 16 Week 6 4.5 Generic algorithms for non-uniform variates We have seen that sampling from a uniform distribution in [, 1] is a relatively straightforward
More informationExpectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda
Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean,
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationThe Multivariate Normal Distribution. In this case according to our theorem
The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this
More information. Find E(V ) and var(v ).
Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More information5.6. PSEUDOINVERSES 101. A H w.
5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationStatistical Signal Processing Detection, Estimation, and Time Series Analysis
Statistical Signal Processing Detection, Estimation, and Time Series Analysis Louis L. Scharf University of Colorado at Boulder with Cedric Demeure collaborating on Chapters 10 and 11 A TT ADDISON-WESLEY
More informationStatistical Pattern Recognition
Statistical Pattern Recognition A Brief Mathematical Review Hamid R. Rabiee Jafar Muhammadi, Ali Jalali, Alireza Ghasemi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Probability theory
More informationHOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)
HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION) PROFESSOR STEVEN MILLER: BROWN UNIVERSITY: SPRING 2007 1. CHAPTER 1: MATRICES AND GAUSSIAN ELIMINATION Page 9, # 3: Describe
More informationStatistical signal processing
Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable
More informationSTATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero
STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank
More information